﻿\section{Loops: several iterators}
\label{loop_iterators}

In most cases loops have only one iterator, but there could be several in the resulting code.

Here is a very simple example:

\lstinputlisting[style=customc]{\CURPATH/iterators_EN.c}

There are two multiplications at each iteration and they are costly operations.
Can we optimize it somehow?

Yes, if we notice that both array indices are jumping on values that we can easily calculate without 
multiplication.

\subsection{Three iterators}

\lstinputlisting[caption=\Optimizing MSVC 2013 x64,style=customasmx86]{\CURPATH/MSVC_2013_x64_Ox_EN.asm}

Now there are 3 iterators: the \IT{cnt} variable and two indices, which are increased by 12 and 28 at 
each iteration.
We can rewrite this code in \CCpp:

\lstinputlisting[style=customc]{\CURPATH/iterators3_EN.c}

So, at the cost of updating 3 iterators at each iteration instead of one, 
we can remove two multiplication operations.

\subsection{Two iterators}

GCC 4.9 does even more, leaving only 2 iterators:

\lstinputlisting[caption=\Optimizing GCC 4.9 x64,style=customasmx86]{\CURPATH/GCC491_x64_O3_EN.asm}

There is no \IT{counter} variable any more: GCC concluded that it is not needed.

The last element of the \IT{a2} array is calculated before the loop begins (which is easy: $cnt*7$) 
and that's how the loop is to be stopped: just iterate until the second index reaches this precalculated value.

You can read more about multiplication using shifts/additions/subtractions here: 
\myref{multiplication_using_shifts_adds_subs}.

This code can be rewritten into \CCpp like that:

\lstinputlisting[style=customc]{\CURPATH/iterators2_EN.c}

GCC (Linaro) 4.9 for ARM64 does the same, but it precalculates the last index of \IT{a1} 
instead of \IT{a2}, which, of course has the same effect:

\lstinputlisting[caption=\Optimizing GCC (Linaro) 4.9 ARM64,label=GCC_no_number_sign,style=customasmARM]{\CURPATH/ARM64_GCC_49_O3_EN.s}

% FIXME1 -> post-increment

GCC 4.4.5 for MIPS does the same:

\lstinputlisting[caption=\Optimizing GCC 4.4.5 for MIPS (IDA),style=customasmMIPS]{\CURPATH/MIPS_O3_IDA_EN.lst}

\subsection{Intel C++ 2011 case}
\myindex{\CompilerAnomaly}
\label{loops_iterators_loop_anomaly}

Compiler optimizations can also be weird, but nevertheless, still correct.
Here is what the Intel C++ compiler 2011 does:

\lstinputlisting[caption=\Optimizing Intel C++ 2011 x64,style=customasmx86]{\CURPATH/intel_2011_x64_Ox.asm}

First, there are some decisions taken, then one of the routines is executed.

Looks like it is a check if arrays intersect.

This is very well known way of optimizing memory block copy routines.
But copy routines are the same!

This is has to be an error of the Intel C++ optimizer, which still produces workable code, though.

We intentionally considering such example code in this book so the reader would understand that compiler output is weird at times,
but still correct, because when the compiler was tested, it passed the tests.
